New Bounds for Matching Vector Families 

Abhishek Bhowmick* Zeev Dvift Shachar Lovett * 



Abstract 

A Matching Vector (MV) family modulo m is a pair of ordered lists U = (iti, . . . , it*) and 
V = (vi, . . . , v t ) where tij, Vj € ZJ^ with the following inner product pattern: for any i, (uj, Vj) = 
0, and for any i ^ j, (ui,Vj) 0. A MV family is called g-rcstricted if inner products (uj, Vj) 
take at most q different values. 

Our interest in MV families stems from their recent application in the construction of sub- 
exponential locally decodable codes (LDCs). There, q- restricted MV families are used to con- 
struct LDCs with q queries, and there is special interest in the regime where q is constant. When 
m is a prime it is known that such constructions yield codes with exponential block length. 
However, for composite m the behaviour is dramatically different. A recent work by Efremenko 
|Efr09j (based on an approach initiated by Yekhanin )Yek08j ) gives the first sub-exponential 
LDC with constant queries. It is based on a construction of a MV family of super-polynomial 
size by Grolmusz jGroOOj modulo composite m. 

In this work, we prove two lower bounds on the block length of LDCs which are based on 
black box construction using MV families. When q is constant (or sufficiently small), we prove 
that such LDCs must have a quadratic block length. When the modulus m is constant (as 
it is in the construction of Efremenko |Efr09j ) we prove a super-polynomial lower bound on 
the block-length of the LDCs, assuming a well-known conjecture in additive combinatorics, the 
polynomial Freiman-Ruzsa conjecture over Z m . 
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1 Introduction 



A Matching Vector Family (MV Family) is a combinatorial object that arises in several contexts 
including Ramsey graphs, weak representation of OR polynomials and recently in constant query 
locally decodable codes (LDCs). It is defined by two ordered lists U = (u\,---ut) and V = 
(v\, ■ ■ ■ vt) where Uj, Vj G Z^ and m and n are integers greater than 1. The property that the two 
lists have to satisfy is the following: for all i G [t], {ui,Vi) = (mod m) whereas for all i ^ j G [t], 
{ui,Vj) (mod m). By (■, •) we denote the standard inner product. Let us call this the standard 
definition of a MV family. If in addition, all the inner products (ui,Vj) (mod m) lie in a set of size 
q, then it is called a q — restricted MV family. Note that q = m corresponds to the standard MV 
family. The size of the MV family is t, the number of vectors in the list. In this paper, we shall 
prove upper bounds on q — restricted MV families in the first part and on standard MV families 
in the later part. 

Let MV(m,n) denote the largest t such that there exists a MV family of size t in Z™ . Anal- 
ogously, let MV(m, n, q) denote the largest t such that there exists a q — restricted MV fam- 
ily of size t in Z^. The question of bounding MV(m,n) (or MV(m, n, q)) is closely related 
to the well-known combinatorial problem of set systems with restricted modular intersections 
[BF981 Sga99[ IGroOO} IGro02j (in this setting the vectors U{ , Vi are required to have entries that 



are either or 1). The systematic study of this more general problem, in the context of MV codes, 
was initiated in |DGY11| . The setting of prime m is well understood. For large prime m = p, it is 
known that MV(p, n) = O (p n / 2 ) [DGY11] . Infact, this is almost tight. When m is a small prime, 
again we have a tight upper bound of O (n p_1 ) |BF98| . Surprisingly, the setting of small composite 
m leads to very useful constructions of Ramsey graphs and constant query LDCs. This is due to 
a construction of MV family over Zq by Grolmusz [GroOOj of super polynomial size in contrast to 
a polynomial upper bound when m is a small prime. Thus, it is interesting to study the behavior 
of MV families for small composite m, and more generally arbitrary general composites. We will 
see later the connection between upper bounds on MV(m, n, q) and lower bounds on the encoding 
lengths of MV Codes (a family of LDCs). For general m, the best upper bound known |DGYllj is 
MV(m,n) < m n-1+OB »W, with 

o m (l) denoting a function that goes to zero when m grows. It was 
conjectured in [DGYll] that an upper bound of ~ m n l 2 should hold for any m (not just prime). 
This would be tight for large m as there are constructions of MV families almost meeting this 
bound |YGK12j . However, the proof method used in jDGYll) to prove the O (p n ^ 2 ) bound does 
not extend to non primes. In this work, we prove the conjecture for q — restricted MV families in 
Z^, for any m as long as q — iog(o(n) log m) (^^^ Theorem^). When m — p is a fixed small prime, it 
follows from [BF98j that MV(p, n) = O (n p_1 ). On the other hand, when m is a fixed composite, 
say m = 6, there exists a MV family of superpolynomial size O (exp (log 2 nj log log n))) [GroOOj . 
We prove a stronger upper bound on MV(m,n), compared to Theorem[T]in such a case assuming 
a well known conjecture in additive combinatorics (see Theorem [2|). Table [T] lists the known and 
new upper bounds on MV families. 

Theorem 1. For all m > 2,n > 1 we have 

MV(m,n,q) < q O(^o gq ) m n/2 

Hence, Theorem [1] resolves the conjecture of [DGYllj for any m and for q = ^1^"^ ■ 
When m >> n, our bound is quite close to the best known construction of MV families which gives 
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m 


MV(m,n) or MV(m,n, q) 


general prime 


MVfm.nl < O (m n / 2 ) [DGY11] 


general composite 


MV(fn,n,g) < q u ^ lo ^m n l 2 (TheoremUD 


small, fixed prime 


MVfm.n) < O (n m ~ L ) |BF98j 


small, fixed composite 


MV(m,n) < ^JmWiogn) (Theorem [2] under Conjectured]) 



Table 1: List of upper bounds on MV(m,n), MV(m, n, q) 



MV(m,n)>(^ [YGK12j . 

Our second result assumes the polynomial Freiman-Ruzsa conjecture (PFR) conjecture (dis- 
cussed below) and gives a stronger upper bound on the size of MV families when m is a constant 
and n grows. 

Before we state the conjecture, we need to define what a difference set is. For an abelian group 
G let A C G. Then the difference set 

A — A = {ai — a2 : 01, a?, G ^4} 

Conjecture 1 (PFR Conjecture in ZJJJ. Suppose A C and \A — A\ < X ■ \ A\. Then one can 
find a subgroup H of size at most \A\ such that A can be covered by X' = X Cm many translates of 
H , where c m depends only on m. 

We note that the PFR conjecture has already found several applications in computer science. 
Ben-Sasson and Zewi |BSZllj used it to construct two-source extractors from affine extractors; 
and Ben-Sasson, Lovett and Zewi [BSLZll] used it to bound the deterministic communication 
complexity of functions whose corresponding matrix has low rank. Our work provides another 
application for the PFR and demonstrates its wide-reaching applicability. We further note that a 
quasi-polynomial version of the PFR conjecture was recently proved by Sanders [SanlOj (see also 
the exposition in |Lovl2j ). Unfortunately, all the applications discussed above require the truly 
polynomial version of the conjecture, and so cannot apply to Sanders' result. 

We now state the second theorem. 
Theorem 2. Assuming the PFR conjecture over 711^ (Conjecture^ we have 

MVfm, n) < exp ( c(m) ) , 

V logV 

with c(m) an explicit function of m. 

From a technical point of view, one of the ingredients in this work builds on the recent work 
of Ben-Sasson, Lovett and Zewi |BSLZlf] who used the PFR conjecture to show that matrices 
over Z2 with large bias (say, with many more ones than zeros) and small rank must contain a large 
monochromatic sub-matrix. An important ingredient in our proof is a generalization of their results 
from Z2 to Z m for all m, not necessarily prime. We note however that this is just one ingredient in 
our overall proof. 
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1.1 Lower Bounds on LDCs: Motivation for MV Family 



Locally Decodable Codes (LDCs) are a special kind of Error Correcting Codes (ECCs) that allow 
the receiver to decode a single symbol of the message by querying a small number of positions in 
a corrupted encoding. More formally, an (q, 5, e)-LDC encodes .fT-symbol messages x to TV-symbol 
codewords C(x), such that for every i £ [K], the symbol Xi can be recovered with probability 
1 — e, by a randomized decoding procedure that makes only q queries, even if the codeword C(x) 
is corrupted in up to 5N locations. Since the early 90's, LDCs have found exciting applications in 
various areas ranging from data transmission to complexity theory to cryptography/privacy. We 
refer the reader to |Tre04t lYekllj for more background. 

A central research question, which is far from being solved, has to do with understanding the 
best possible 'stretch' of an LDC with a constant number of queries. That is, how large iV has 
to be as a function of K for constant q and with constant 5, e (these two last parameters are not 
our focus here and we will generally assume them to be small fixed constants). For q = 1, 2 this 
question is completely answered. There are no LDCs for r = 1 |KT00] and the best LDCs with 
q = 2 have exponential encoding length [GK ST02~1 IKdW04| . For q > 2 there are huge gaps in our 
understanding. Katz and Trevisan were the first to study this problem |KT00| and, today, the best 
general lower bounds on N are slightly super- linear bounds of the form Q (ifi+Va^l-i)) |Woo07j . 
Notice that, when the number of queries is 3 or 4, these bounds are quadratic (see also [KdW04| 
IWoolOj for the q = 3, 4 case). The upper bounds were, until recently, those coming from polynomial 

codes and were of the order of N < exp [K^- 1 ). Improved upper bounds, breaking this barrier 



slightly, were given in [BIKR02] . 

This state of affairs changed dramatically when, in a breakthrough paper, Yekhanin jYek08| de- 
veloped a new approach for constructing LDCs, called MV codes, that have much shorter codeword 
length than polynomial codes. Efremenko |Efr09| was the first to show that this approach could 
yield codes with subexponential encoding length (Yekhanin's paper showed this under a number 
theoretic assumption). More refinements and improvements to this new framework were obtained 
|Rag07j IKY091 USTOl lMFL+101 IDGYlll IBETlOj to give LDCs with q queries and with encoding 



length that grows, when q is a constant, roughly like 

iV~ exp exp ((logiO^/'^OoglogK) 1 - 1 / 10 ^ 



While significantly smaller than the length of polynomial codes, the codeword length of these 
new codes is still super polynomial in K. The most general setting of parameters was addressed 
in [DGYllj where the authors had given a black box construction of q query MV codes using 
q — restricted MV families in r E} n . Using the standard definition of MV families, this implied m 
query MV codes using MV families in Z^. In this basic, yet general reduction, it was shown that 
upper bounds on MV families would lead to lower bounds on the encoding length of MV codes. 
With this motivation in mind, the authors in [DGYllj made a conjecture on the upper bound on 
the size of MV families which would lead to lower bounds on the encoding length of MV codes 
under the basic framework. We note that Yekhanin in [Yek08] used restricted MV families in Z™ 
where p is a very large Mersenne prime and used a specialized technique to reduce the number of 
queries from p to 3. Another instance of reduction in the number of queries from what the standard 
construction gives, was given by Efremenko [Efr09] where he again used restricted MV families. A 
certain gadget was discovered using computer search whereby the author worked in Z511 but got 
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down the number of queries to 3 from the basic bound of 511. 
The following is a corollary of Theorem [TJ 

Corollary 3. For an arbitrary positive integer m, consider an infinite family of q- query Matching 
Vector code C n : ¥ k — > ¥ N for n G N, where k(n) and N(n) are growing functions of n, constructed 
using the black box reduction from a q-restricted Matching Vector Family in Z^ ( JDGYll^ ). For 
large enough n, if q = lo °$$™ m) , then 

N > fc 2 -°« 

Specifically, if q = 0{\), then N = 17 (k 2 )- 

Next we have the following corollary from Theorem [2 

Corollary 4. For some arbitrary positive integer m, assume the PFR conjecture over Z^ (Con- 
jecture^. Consider an infinite family of m- query Matching Vector code C n : — > ¥^ for n E N, 
where k(n) and N(n) are growing functions of n, constructed using the black box reduction from a 
standard Matching Vector Family in Z^ (lDGYll\j ). For large enough n, if m = 0(1), then 

N = exp (Q m (log k log log k) ) 

Thus Corollary [4] states that, assuming Conjectured! MV codes with constant number of queries 
must have super polynomial encoding length in the basic framework. Note that we get the same 
bound in Efremenko's framework for 3 queries. This is because the form of the super polynomial 
bound is assuming a constant m and applying our bound to Efremenko's work again leads to a 
superpolynomial bound as m = 511 in his setting (another constant). (He uses Z511 to construct 
the MV family and further reduces the number of queries to 3.) This essentially means that in order 
to construct polynomial length codes, one needs to construct MV families in ZJ^ for non-constant 
m and use some specialized gadget to reduce the number of queries. One way is to ensure it is a 
q — restricted (constant q) MV family. This automatically ensures q query decoding. However, the 
quadratic lower bound continues to hold even in this scenario for constant q. To beat the quadratic 
lower bound for constant query MV codes, one needs to construct q — restricted MV families for 
growing m and q = i g^n)\ ^m) anc ^ t nen develop some special gadget to get the number of queries 
down further from q to some constant. 

1.2 Proof Overview 

The proof of Theorem Q] relies on intuitions coming from the theory of two-source extractors |CG88j . 
which are functions of two variables F(X, Y) such that the output of F is distributed in a close- 
to-uniform fashion whenever the two inputs are drawn, independently, from two distributions of 
sufficiently high entropy. Since our proof does not use two-source extractors explicitly we do not 
define them formally and just use them to explain the high level idea behind the proof. It is a 
well known fact |CG88j that the inner product function F(X,Y) = (X,Y), say over Zg x ZJ? is 
a good two source extractor when the two inputs X and Y are both drawn uniformly from sets 
Sx>Sy Q ^2 °f s ^ ze l ar g er 2 n / 2 . This immediately suggests a connection to MV families, since, if 
we take Sx = U and Sy = V for a MV family U, V in Z2, we would get a completely non- uniform 
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output (it will be zero with exponentially small probability) . This means that the size of U, V is 
bounded from above by approximately 2 n / 2 . 

If we try to use a similar argument over Z m we run into trouble since the inner product function 
modulo m is not a good two source extractors for sources of size r?W 2 . Take, for example, Sx = 
Sy = {0, 2, 4} n C Zg and observe that (X, Y) is always divisible by 2 and so is far from being 
uniformly distributed over Zq. It is, however, possible to show that this example is, in some sense, 
the only example and that, in general, we can always find a certain number of elements of either 
Sx or Sy that 'agree' modulo some factor of m. This observation suggests proving Theorem [T] by 
induction on the number of factors of m, which is the way we proceed. 

The proof of Theorem [2] uses a slightly different view of MV families as matrices with certain 
zero/non-zero pattern and small rank. Specifically, for a MV family U, V of size t in Z^ consider 
the t x t matrix P whose (i, j)'th entry is (ui,Vj) mod m. The definition of a MV family implies 
that P has zeros on the diagonal and non-zeros everywhere else. If m was a prime, we could think 
of Z m as a field F and say that, since P is the inner product matrix of vectors of length n over a 
field, it must have rank at most n. Conversely, every t X t matrix over a field F with these properties 
(zero on the diagonal and non-zero off the diagonal) and with rank n gives a MV family of size t 
in ¥ n . We can call a matrix with this pattern of zeros/non-zeros an MV matrix. Thus, when m 
is prime, the question of bounding the size of a MV family is the same as lower bounding the rank 
of a MV-matriji0. When m is composite, this whole approach should be re-examined since Z m is 
no longer a field and our familiar understanding of matrices and linear algebra over a field are no 
longer valid. We do, however, manage to carry over this correspondence between the two problems 
by defining the notion of rank in a careful way (more on this issue below). 

Assume for the purpose of this overview that the usual notion of rank and other intuitions from 
linear algebra are valid over Z m and let us proceed with sketching the proof of Theorem [2] using 
the equivalent formulation as bounding (from below) the rank of a MV matrix P. The starting 
point is a generalization of a result of [BSLZll] . mentioned above, from Z2 to Z m . We show that 
every matrix P over Z m that is biased (i.e., its values are not distributed close to uniformly) and 
has low rank, contains a large monochromatic sub-matrix modulo some factor m' of m. The size of 
the sub-matrix is bounded from below by ~ \P\ exp(— r' / log(r')), where r' is the rank of P modulo 
m! (this factor depends on the specific way the matrix is biased). This generalizes the result of 
[BSLZll] which assumes m = 2 and finds a large monochromatic sub-matrix (modulo 2). We note 
that the sub-matrix lemma is the only component in the proof that relies on the PFR conjecture. 
Let us refer to this result from now on as the sub-matrix lemma. We can apply the sub-matrix 
lemma to a MV matrix P since its values are far from uniform (the probability of zero is much less 
than 1 /m) and since its rank is assumed (towards a contradiction) to be low. 

Suppose for the sake of simplicity that m = p - q, with p, q distinct primes (the proof for general 
m is significantly more technical but relies on the same basic intuitions). Applying the sub-matrix 
lemma we obtain a sub-matrix Pi of P that is constant modulo some factor mi of m (so mi is either 
p, q or m) of size at least |P| exp(— r±/ log(ri)), where r\ < n is the rank of P mod mi. Using 
some matrix manipulations, and subtracting a rank one matrix, we can get a large sub-matrix 
P{ that does not intersect the diagonal of P and s.t all of the entries of P[ are zero modulo m\ . 

For technical reasons, the actual proof will not be entirely using matrices and will keep the MV family in the 
background. This is because we need to keep certain invariants throughout the proof and these are easier to define 
for families of vectors than for matrices. 
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Suppose \P[\ = t\ and consider the 2t\ x 2t\ sub-matrix P" of P that has P[ as its top-right (or 
bottom-left) block and s.t the top-left and bottom-right blocks are taken to have zero diagonal 
elements. Formally, if P[ is indexed by rows in R and columns in T with R n T = then the 
rows/columns of P" will be indexed by R U T. If we consider the matrix P" modulo mi then it 
has top-right block which is all zero and so its rank (modulo m\) will be the sum of the ranks of 
the top-left and bottom right blocks. Thus, one of these blocks, w.l.o.g the top-left one, must have 
rank at most n/2 (over Z mi ). Notice also that both of these blocks are themselves MV matrices 
modulo m since they are sub-matrices of P with the same row and column sets. Let Pi be the 
top-left block of P" . We can now apply, again, the monochromatic sub- matrix lemma to find a 
large sub-matrix P2 of P\ which is constant modulo some other factor 772-2 of m. The size of P2 will 
be 

ti • exp(-r 2 /log(r 2 )) = \P\ ■ exp(-n/ log(ri) - r 2 /log(r 2 )). 

The factor m 2 is also either p or q. If it happens to be that mi = m 2 then r 2 < n/2 and so we 
gain in the size of P 2 in this second step (the expression r 2 /log(r 2 ) is smaller than n/21og(n/2) 
which is smaller by roughly a factor of two than our bound on n/log(ri). Suppose we continue 
with this iterative process of finding constant sub-matrices for I steps and that, by luck, all the 
factors mi,m 2 , . . . are equal to the same factor of m (say p). Then, after roughly log(n) iteration, 
we will reduce the rank modulo p to one and still have at least 

|P| " 6XP (" Ij *log(n/*)) 

rows, which is close to the original size of P if we assume (in contradiction) that \P\ » exp(n/ log n) 
(this calculation is given in Claim lAT]) . In this case we obtain a new large MV family U' , V mod- 
ulo m such that all inner products {u^v'-) of elements u[ E U',v'j E V are fixed modulo p. From 
this we can easily construct a MV family of roughly the same size in Z™ and then use the bounds 
on MV(j,n) for primes to get a contradiction. In the 'unlucky' case we will have different fac- 
tors 7711,7712,... in each stage, but we can adapt the analysis to consider the decrease in rank 
simultaneously for all factors of m. 

The full proof is by induction on the number of factors of m and uses the iterative sub-matrix 
argument above to go from a MV family modulo m to a MV family of roughly the same size modulo 
some proper factor of m (and then uses the inductive hypothesis on this new MV family). 



1.3 Matrix rank over Z m 

An important technical issue, which was already hinted at above, is in the definition of the rank 
of a matrix with entries in a ring Z m . There are two main properties of matrix rank over a field 
that we relied on in the proof sketch above. The first is that a rank r matrix is always the inner 
product matrix of vectors in r dimensions. Equivalently, a t x t matrix of rank r can be written as 
a product of a t x r matrix and an r x t matrix. This is important if we are to go back and forth 
between matrices and MV families. Another property we used is that, if we have a 2t x 2t matrix 
composed of 4 blocks of size t x t and the top-right block is zero, then the rank of the matrix is the 
sum of the ranks of the top-left block and the bottom right block. 

Ideally, we would like to define rank over Z m so that both properties are satisfied. This is, 
however, impossible as the following example shows: Consider the 2x2 matrix with the two rows 
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(4, 0) and (0, 3) over 7*q. This matrix can be written as the product of the two vectors (2, 3) T and 
(2, 3) and so should have rank one, if we are to satisfy the first property. However, if we are to 
satisfy the second property, its rank should be the sum of the ranks of the two lxl matrices (4) 
and (3), which clearly cannot have rank zero! 

Our solution to this problem is to give two different definitions of rank, each satisfying one of 
the two properties. We then show that the two definitions of rank can differ from each other by 
a multiplicative factor of logm, which our proof can handle. The first definition of rank is as the 
smallest r such that our t X t matrix can be written as a product of a t x r matrix and an r x t 
matrix. Clearly this would satisfy the first property (but not the second). The second definition of 
rank is termed column-rank and is defined as the logarithm to the base m of the size of the additive 
subgroup of l} m generated by the columns of the matrix. Notice that this definition of rank can 
result in the rank being non-integer. For example, the rank of the matrix with a single column 
(2,0) over Zq would be equal to log 6 (3) since the subgroup generated by this column is composed 
of the three vectors (2,0), (4,0), (0,0). It is not hard to show (see Claim [43]) that this definition 
satisfies the second property described above regarding block matrices. Clearly, the two definitions 
agree for matrices over a field. We show (see Claim I4.6P that the two notions of rank can differ by 
a multiplicative factor of at most logm. This allows us to use both definitions in different parts 
of the proof without losing too much in the transition. We finish this discussion by noting that in 
no part of the proof do we use the characterization of rank using determinants, which is often very 
useful when working over a field. 

1.4 Organization 

We begin with some preliminaries in Section[2j We prove Theorem[T]in Section[3l Section |4] contains 
some claims about matrices over Z m . Section [5] introduces collision free MV families. Both Section 
H]and Section [5] will be used in the proof of Theorem [2] in Section [6J The proof of Theorem [2] also 
requires the sub-matrix lemma, whose proof appears in Section UJ 

2 General preliminaries 

Notations: Throughout the paper we will be handling ordered lists of elements. A list A of size 
t over a finite set £1 is an ordered t-tuple A = (ai, 02, • • • ,04) where each £ fi. A list can have 
repetitions. If it doesn't we say it is twin free. When discussing sublists ACB with B = (b\, . . . , b t ) 
we will use the convention that, unless specified otherwise, A maintains the ordering induced by 
B. For a positive integer t, we let [t] denote the list (1, • • • t). So, for example, when we say that 
T C [t] we mean that T is a list of integers in increasing order belonging to [t]. We say that a list 
A = (ai, . . . , at) over f2 is constant if a, = a,j for all i,j £ [t]. We assume all logarithms are in base 
2 unless otherwise specified. 

2.1 MV Families: Basic Facts and Definitions 

We now start with some basic definition and claims regarding MV families. 

Definition 2.1 (Matching Vector Family). Let U = (ui,U2, ■ ■ ■ u t ) and V = (v±, V2, • • • vt) be lists 
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over ZJ^. Then (U, V) is called a matching vector family of size t in Z^ if 

• (ui,Vi}=0 (mod m) , Vi. 

• (ui, Vj) 7^ (mod to) , \/i ^ j. 

If in addition, we \{{u, v) : u G U,v G V}\ = q, we call such a MV family an q — restricted MV 
family. We denote the size of (U,V) by \(U,V)\. For instance, \(U, V)\ = t above. 

Definition 2.2 (Subset of Matching Vector Family). Let U = (ui, U2, ■ ■ ■ ut) ,V = (v\, i>2, • • • ft) 
form a matching vector family in Z^ of size t. By (U',V) C (U, V), we mean there exists a sublist 
T C [t] such that U' = (ui : i G T) ,V' = (vt : i G T) . Observe that (U 1 , V) is a matching vector 
family in Z™ v 

Definition 2.3 (MV(m,n)). We denote by MV(m,n) the maximum size of a matching vector 
family (U, V) in . Similarly, we denote by MV (to, n, q) the maximum size of an q — restricted 
matching vector family (U,V) in Z^. 

We shall use the following simple facts implicitly throughout the paper. 
Fact 2.4. MV(m,n) is an increasing function of n. 

Proof. For n\ < n-2, we show MV(m,ni) < MV (to, r^)- Given (U, V), a matching vector family 
in ZJ^ 1 , we can pad each element in U and V by n2 — n\ zeros and obtain a matching vector family 
in Z^ 2 of the same size. □ 

Fact 2.5. If (U, V) is a matching vector family in Z^, then U and V are twin free. 

Proof. Let U = (u\,U2, • • • ut) , V = (v±, «2, ■ ■ ■ vt). We prove U is twin free. By symmetry V is also 
twin free. Suppose Ui = Uj for some i ^ j. Now, (ui, Vj) = (uj,Vj) = which is a contradiction. □ 

To facilitate writing in the proofs to follow we introduce the following notation for taking lists, 
matrices, etc. modulo an integer r. 

Definition 2.6 (Modulo r notation). Let 2 < r < to be such that r divides to. Given a = 
(a±,--- ,a n ) S Z^ ; we denote by = (a\ (mod r), ■ ■ ■ , a n (mod r)) G Z™. For a list U = 

(u\,U2, ■ ■ ■ ut) over Z^, let = (u^ , ■ ■ ■ ■ Also, if is constant for all u £ U, we 

say is constant. Similarly, for a t x t matrix M over Z m , define to be the t x t matrix 

over Z r such that (j, k) = M (j, k) (mod r) for all 1 < j,k < t. 

We will also need the following definitions. 

Definition 2.7 (Bucket B r (w,A)). Let A C ZJ^ be a list. For any w G Z" ; we denote by 
B r (w, A) = (a G A : = w) the sub-list of elements of A which are equal to w modulo r. 

Definition 2.8 (Matrix Pjjy). Let U = (u±, 112, ■ ■ -ut) and V = (vi, V2, ■ ■ ■ vt) be lists over r E^ l . 
We let Puy be the t x t matrix over Z m defined by P\jy (i,j) = {ui, Vj) for 1 < i,j < t. 

We will use the following lemma from [DGYllj mentioned informally in the introduction. 
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Lemma 2.9. WGY1 1\ Theorem 21] For any positive integer n and prime p, MV(p,n) < 1 + 
2.2 Probability Distributions 

Definition 2.10. For a distribution fj, over a finite set f2, we write X ~ jj, to denote a random 
variable X drawn according to u. We will also treat [i as a junction /i : (] 4 [0,1] such that 
fj.(x) = ~Pr[X = x]. For a list A over f2, x ~ A denotes a point sampled as per the uniform 
distribution on A (taking repetitions into account). 

Definition 2.11 (Statistical distance between distributions). Let u\ and [X2 be two distributions 
over a finite set ft. The statistical distance (or simply distance) between u\ and ^2, denoted 
A(//i,/i2), is defined as 

A (jn, fi 2 ) = 2 Yl l^ 1 ( x ) ~ ^ 2 ■ 

Definition 2.12 (Collision probability). Given a distribution [i over a finite set fi the collision 
probability of n, denoted cp( / u), is defined as 

cp (/*) = Pra,, y ~ M [z = y] = ^2fi {xf . 

x£tt 

The following two lemmas are standard and their proofs are included, for completeness, in 
Appendix [Bj 

Lemma 2.13. Let n be a distribution over 'Ei-m and lethi m denote the uniform distriution over Z m . 
If A (fj,,U m ) > e then for some 1 < j < m — 1, 

2e 

> 



L- 'J 

where uj = exp (2iri/m) is a primitive root of unity of order m. 

Lemma 2.14. Let u be a primitive root of unity of order m. Let and H2 be two probability 
distributions over Z m . If \^ x ^fj,i,y^^ 2 [w^'^] | > e, then cp (fii) cp (a 2 ) > e 2 /m n . 

3 Proof of Theorem [I] 

In this section we prove Theorem [IJ restated here with explicit constants. 
Theorem 3.1. Let m > 2,2 < q < m and n be arbitrary positive integers. Then 

MV (m, n, q) < 12q • g 24 ( 1+lo g« 109 )m n / 2 . 

For the purpose of the proof, we introduce the following notation that will be used only in this 
section. 

Definition 3.2 (MV riir2 (m, n, q)). Letr\,r2 be integers such that r\r2\m. We denote by MV rijT . 2 (m 
the maximum size of a q — restricted MV family (U, V) in Z m satisfying 
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• [/( ri ) and ~l/( r2 ) are constants. 

• (u,v) = (mod rxr^j for all u G U,v G V. 

AToie i/iai MVi i (m, re, q) = MV (m, n, q) (with the convention that x (mod 1) = for any integer 
x). 

Before we go to the proof of Theorem [2j we have the following claims. 

Claim 3.3. Let (U,V) be a q-restricted matching vector in Z^. Then, without loss of generality, 
m has at most q prime factors. 

Proof. Assume m = nl=iP? with possible r > q. Let v\,--- ,v q G Z m be the q possible 
nonzero values that the inner products {u, v) attain. For each Vj there is some prime p^ where 

mod p^ J . So, we can replace m with just n}=iPi/ an d discard all primes other than 

Ph,-- - ,Pi q - □ 

Claim 3.4. If N has r prime factors, then \{x G Zn : order(x) < N/S}\ < N/S ■ (\ogS) r . 

Proof. Assume ./V = 131=1 Pi element x with order(x) < N/S is divisible by some r[i=iPf l — 
S. Let T = {(/!, ■■■ J r ): Up{ 1 > S}. Define a partial order on T by (/i, ■ ■ ■ , f r ) < (f[, ••.,#) 
if fi < //■ Let T' be a subset of T such that for any t £ T there is t' G T' such that t' < t. Note 
that if x has order < A r /S' then x must be divisible by niPf* f° r some • ' • j fr) i n T' ■ So, the 
number of elements of order < N/S is at most N\T'\/S. We can bound the size of T" as follows: 
any element fi is between and log„. S, since clearly if fi is larger we can reduce fi by one. So, 
\T'\ < n!=i(logS/log K ) <= {logSf 1 . □ 

The proof of Theorem 13.11 will follow immediately from the following two lemmas, which will 
be proved below. 

Lemma 3.5. Let m = r\r2r% where ri,r2,r-^ are arbitrary positive integers such that r^ > 2. Let 
q > 2,t > 12q and n be arbitrary positive integers. Let (U, V) be a q — restricted matching vector 
family in with \ (U, V)\ = t such that 

• and are constants. 

• (u,v) = (mod r\r2) for all u G U,v G V. 

Then, there exists s\rs with s > max{2, r^/q 10q } and a q — restricted matching vector family 
(U',V) C (U,V) such that 1(17', V')| > s~ n / 2 q- 2 H where 

• (u',v') = (mod rir 2 s) for all u' G U',v' G V . 

• Either U'^ 1 ^ is constant or V'^ T2S ^ is constant. 

Applying Lemma 13.51 iterativelv we can prove the following bound. 
Lemma 3.6. MV fl , rj (m,n,q) < 12q • q Mlog ^ (^Y^ ■ 
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Given Lemma 13.61 and Lemma 13.51 we now show how to deduce Theorem 13.11 

Proof of Theorem \3.1[ Observe that for any matching vector family (U,V) in ZJ^, and 
are constants and (u,v) = (mod 1) for all u E U, v E V. Thus, MV(m, n, q) = MVy (m,n,q). 
Case 1: m < q 10q . Applying Lemma 13.61 we get MV (rn,n,q) = MVn (m,n,(j) < I2q ■ 
q^1 m (m) n/2 < \2q ■ g 24(l+log 9 ^) ( m) n/2_ 

Case 2: m > q 10q . By Lemma 13.61 we know that for s > m/q 10q , MVi )S (m,Ji,g) < 12q ■ 
g 241og T (^) n/2 < I2q • g 241o g9 10 « (2n)™/ 2 . Similarly, we have for s > m/4q, MV s>1 (m,n,q) < 
I2q ■ q 2 ^°gi 10q (™) n / 2 . 

Now, suppose there is a q — restricted MV family (U, V) in ZJ^ of size t > 12q-q 2i ( 1+logq 9 )r?W 2 . 
Applying Lemma 13.51 with ri = r<i = 1, we get a q — restricted MV family ([/', V') C ([/, V) of size 

i' > s-"/ 2 g- 24 t > q 2il °z qWq (^)" /2 where s > m/q Wq such that 

• = (mod s) for all u' E £7', «' E V. 

• Either C/ 7 ^ is constant or V'^ is constant. 

But, by the previous paragraph, we have for s > m/q 10q , MV S) i (m,n,q) and MVi iS (m,n,q) are 
at most I2q ■ q 2A ^q Wq (f) n/2 . This leads to a contadiction. □ 



3.1 Proof of Lemma 13.51 

By assumption we have that (u, v ) = (mod r\r2) for all u E U, v E V. So, we can consider 
^r^L E Z rs . Also, by hypothesis, the inner products occupy q' < q residues in Z rs . We have 
that 



For 1 < i < t, = (mod r 3 ) since (ui,Vi) = (mod m). 



• For 1 < i,j <t,i^ j, - 7^ (mod r 3 ) since (ui,Vj) ^ (mod m). 

Let \i denote the distribution over Z ra defined by ^"""^ mod r3 where itj , Vj are drawn inde- 
pendently and uniformly from U, V respectively. 

Case 1: Aq' > r 3 . Observe that \x outputs only when i = j. Therefore, Pr\p = 0] = 1/t < 
l/Ylq' < l/3r 3 . On the other hand, Pr[W r3 = 0] = l/r 3 . This implies that A((j,,U r3 ) > l/3r 3 . 
Thus, applying Lemma 12.131 with uj = exp (2iri/r 3 ), we get that for some 1 < j < r 3 — 1, 



E 



> > 



1 



3r 3 ^3" " 12(?'3/2- 



Let a/ = w J and ord(oj') (the order of a/) be s = r 3 / 'gcd(r 3 ,j). Also, note that as j > 1, we have 
s > 2. Also, trivially, s > r 3 /q' 10q ' > r 3 /q 10q . 

Case 2: 4q' < r 3 . Let X be the random variable that picks a random < j < r 3 — 1 and outputs 
|Ka:~£t [(w- 7 ) 2 ] |. We will now show that with significant probability X 2 > l/2q' . First observe that 
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X < 1. On the other hand, we will show that E [X 2 ~^ is large. To see this, let Z = {z±, ■ ■ ■ z q /} be 

def 

the q' residues forming the support of /i. Also, for 1 < i < q', let «j = [i(zi). Then, 



% [x 2 



E, 



otic** 

l<i,i'<q' 
2 



j( z i- z i') 



!<«<<?' 

Therefore, we claim that PrLY 2 > l/2q'} > l/2q' > l/2q. If not, then 

Ej [X 2 ] = PrLY 2 > l/2q']Ej [X 2 \X 2 > l/2q'} + Pr[X 2 < l/2q']Ej [X 2 \X 2 < l/2q'} 
< l/2q' + l/2q' 
= W 

which is a contradiction. 

By the above, we already have that there exists some uj' such that (E^^ > l/\/2q' and 

ord(uj') > 2 since r^/2q' > 1 and thus uj' is not trivial. 

Now, we shall show the existence of uj' of much higher order provided r% > q' 10q ' . By Claim 13.41 
for S = q' 10q and N = r$, and noting that r$ has atmost q prime factors by Claim [3T3l we have 

P rj [ord(uj j ) < r 3 /S] < l/4q' 
Thus, with probabilty at least l/2q' — l/4q' = l/4q', a random j satisfies 



\E x ^[(^y)\>i/vw>^ 



• s = ord (uj 3 ) > r^/S 

Also, as r^/Aq' > 1 the above two conditions are true for some j ^ 0. 

Now, we combine the above two cases as follows. Let uj' = uj 3 ' and e = 12g 1 3/2 , 
by the above case-by-case analysis that 



We have shown 



. \E x ^[(uj') x ]\ >e 

• s = ord(uj') is such that s > max{2, r^/q 109 } 
Using the Cauchy-Schwartz inequality twice we get 



Em .u^U ,v ,v^V 



UJ 



a (u,v)/rir 2 



> e 



UJ 



f\ (u— u,v)/r\T2 



(uj') 



/\ (u— u,v— v)/r\T2 



> e" 



(uj') 



/\ ((u-u) /n ,(v-v) jr-2) 



>e 4 . 
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We need to explain the last expression. Since by assumption [/( ri ) and V^" 2 ^ are constants, 
(u — u) jr\ G Z^ and (v — v) jr 2 G Z^ are well denned. Thus, we can fix u and v by an aver- 
aging argument such that 



^u~U,v~V ( w 



A ((u-u)/r 1 ,(v-v)/r 2 ) 



>e 4 . 



Let U' = (ui,v,2, ■ ■ ■ u' t ) , V' = (v'i, v' 2 , ■■ -v' t ) where v! i = (ui — u)/r\ and v[ = (vi — v)/r 2 . 
Notice that U' and V are not assumed to be a MV family (later we will derive from them a MV 
family). We now define two probability distributions fjF and /i over Z™. For each w G Z" , let 
fi u ' (w) = \B S (w, U')\ I \U'\ and /jX' (w) = \B S (w, V')\ /\V'\. That is, fjF' (w) is the probability 
that u'( s ) = w where u' is chosen uniformly in U' , and similarly for /j, v ' (w). Therefore, since the 
order of w' is s, we have that 



E xjl v > 



(a/) 



A (wi,W2) 



Recalling that s is the order of uj' and applying Lemma [2.14[ we get cp \ yF J cp (fjY J > e 8 /s n . 

Therefore, one of cp (m^'^j c P (^') ' sa ^ C P (' iC7 ')' * S a * ^ eas * e V sn ^ 2 - Let iu* be the point of 
maximum probability mass given by yF' . Then, 

fj, u ' (w*) = ^ (w*) ^ /i^' («,) > ^ ^' H 2 = cp U u ') > 6 4 /s n/2 - 



Now, /x^' (w*) > e 4 /s™/ 2 means that {u £ U : ^ = w* (mod s)} > te 4 /s™/ 2 . Equivalent^ 



{ueU: 



u — u 



r\iv* (mod r\s) } 



> te 4 /s n / 2 . 



Let T' = (i:ui=u + nw* (mod ns)). Now, define U" = (114:1 G T') and V" = (u 4 : i G T"). 
Observe that (U" , V") is a matching vector family in Z^ such that 

• U"( risS) and V"^ are constants. 

• K^.V-'OI > t(e 4 /s n/2 ). 

The only thing left is to show that (it, v) = (mod ri^s) for all u G v G V". This may 
not be true in general. However, we can take a large subset of the matching vector family so that 
the resulting matching vector family satisfies this condition. To see this, let u G U", v G V" be 
arbitrary. Now, u = r\s ■ v! + uq and v = r 2 ■ v' + v where u', v' depend on u, v respectively and 
uq,vq are independent of u,v. Then, 

(it, v) = rir 2 s(«', v') + ns(u', v ) + r 2 (u , v') + (it , v ). 

As u varies over U", (u',vq) takes at most q values modulo r 2 . Hence, r\s(u' ,vq) takes at most q 
values modulo r\r 2 s. Therefore, there exist at least (1/q) \U"\ elements of U" such that ris(u',vo) 
is a constant modulo r\r 2 s. We take the corresponding elements from V" to form a matching vector 
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family (U"',V"') C (U",V"). We apply another round using the same idea on U'",V", this time 
ensuring that ^(uo,?/) is constant modulo ri^s as v varies over a large fraction of V'" . Thus, we 
end up with V of size at least (l/q) \ V"'\ such that r2{uo, Vi) is a constant modulo ri^s. We take the 
corresponding subset U from U'" so that (U, V) C ([/'", V'") is a matching vector family. Denote 
the size of (U, V) by t. Note that U = (ui, ■ ■ ■ , u^) , V = (v±, ■ ■ ■ , vj) is a matching vector family in 
Z» of size at least (l/g 2 ) t (e 4 /s n / 2 ) = s - n / 2 q-( 8+ilo ^)t > s -«/2 g -(8+4log 2 (i2)) t > s -n/2 q -2 H 
Also, as («, u) is a constant modulo ri^s, for u E U,v E. V, and (ui,Vi) = (mod n^s), we get 
that (u,v) = (mod r±r2s), for -u G f7, v G V. This concludes the proof. □ 



3.2 Proof of Lemma 13.61 

We prove the lemma by backward induction on ri^m. That is, to prove the claim about 
MV rijf . 2 (m,n, q), we assume the inductive hypothesis for MV r i r > (m,n,q) where r[r 2 > r\T2 
and r^r'^m. 

Base Case. The base case of r\T2 = m is trivial. To see this, observe that if (u,v) = (mod m) 
for all u G U, v G 1/, then by the definition of a matching vector family in Z^, the size of such a 

24 log 



family cannot exceed 1. Hence, for r\T2 = m, MV riir2 (m,n,q) = 1 < 12(/ • g r i r 2 I 



n/2 

Inductive Step. Let m = r\r2Tz with r\T2 < m (that is, r% > 2). By the inductive hypothesis we 

24 log ( \ n/2 

have MV r / r / (m, re, g) < 12q-q r i r 2 I ^jjt) for all r[, r' 2 such that r[r' 2 > rir2 and r[r 2 \m. We 



need to show that MV ri)7 . 2 (m, n, g) < 12g • g 241og n*^ I -HL. \ . Suppose this is false, so that there 



T\T2 



n/2 



exists a q — restricted matching vector family (U, V) in Z^ with U = (u\, ■ ■ ■ ut) , V = (v±, ■ ■ ■ vt) 

2 Hoc — / 

where t > I2q ■ q r i r 2 I J such that 

• U~( ri ^ and V^ 2 ^ are constants. 

• (u, v) = (moo? r\T2) for all u G U, v G V. 

Note that t > 12(7. Therefore, applying Lemma 13.51 there exists s|r3 with s > 2 and matching 
vector family (U',V) C (£/, V) such that |(Z7', V)| > s~ n/ V 24 i where 

• (u',v') = (moo! n^s) for all v! G ?/',«' G V . 

• either U'( ris ' is constant or y'( r ' 2S ) is constant. 

Without loss of generality, we assume that U'( TlS * > is a constant. Therefore, 



(U',V')\ > s -n/* q -™.12q. q 24l ° g ^(^L) 

' V r l r 2/ 



24 log -22 1 / m 

12? ■ q V 6 rjra i 



n/2 
n/2 



rir 2 s 

n/2 



241og^2- / m 

> I2q ■ q r i r 2 s 

\rir 2 s / 

where the last inequality used the fact that s > 2. This however contradicts the inductive hypoth- 
esis. □ 
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4 Matrices over Z m 

Notations: For a t x s matrix Af over Z m and for lists T C [t], 5 C [s] the T x S submatrix of 
M is the matrix with rows in T and columns in S. For i £ [s] and j G [i] we denote the i'th row of 
M by M(i :) and the j'th column by Af (: j). 

Definition 4.1 (Span of a set). For A C ZJ^ Zei span (A) denote the additive subgroup generated 
by A. We say that a set A spans u £ Z^ if u £ span(^4). 

Definition 4.2 (Rank of a matrix over Z m ). Lei M be at xt matrix over Z m . T/ien rank(Af) is 
i/te smallest r such that Af = AB where A is an t x r martrix over Z m and B is an r x t matrix 
over Z m . 

Definition 4.3 (Column rank of a matrix over Z m ). Lei M be a t x t matrix over Z, n . Let 
colspan (M) denote the subgroup of Z, l m generated by the columns of Af. The column rank of M 
over Z m is defined as 

colrank (Af ) = log m | colspan (M) \ . 
The column rank is, in general, a real number in the range [0, t) . 

Since the rank can behave in unexpected ways over Z m , we make sure to prove some of the 
basic facts that we will be using later on. 

Fact 4.4. Let M be atxt matrix over and let J\T f be any submatrix of AI . Then colrank (A/') ^ 
colrank (M) . 

Proof. Suppose M' is given by the first t' rows and the first t" columns of M. We will define 
an injective map / : colspan (Af) — > colspan (Af). Given any x E colspan (Af') we can write 
x = X^$=i a i ' j) in some fixed way (there might be several choices of ctj). Define f(x) = 
ctj ■ M(: j). Then, x is clearly the restriction of f(x) to the first t' indices and so the map is 
injective. □ 

Fact 4.5. Let M be a t x t matrix over Z m and let s\m. Then rank (Af^) < rank(M). 

Proof. Suppose there exist antxr matrix A and an r x t matrix B over Z m such that Af = AB. 
Then M ^ = A^B^ and so the rank of is at most r. □ 



We will need the following claims relating the rank and the column rank of matrices over Z m . 
Claim 4.6. Let M be an t x t matrix over Z m . Then, 

rank(M) , , , 
— — i — - < colrank (M ) < rank (M) . 
logm 

Proof. Let r = rank (Af ) and r' = colrank (Af). We first prove that r' < r. This is equivalent to 
proving that |colspan (M)| < m r . Let Af = AB where A is an t x r martrix over Z m and B is an 
r xt matrix over Z m . Since the columns of Af are all in the span of the columns of A we have that 
the column span of Af can contain at most m r elements. 
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We now prove that r' > r j (logm) or, equivalently, |colspan (M)\ > 2 r . Suppose in contradiction 
that jcolspan (M )| < 2 r . Take a minimal spanning set S of colspan (M) (that is, a set that spans 
colspan (M) and such that no proper subset of it does). Suppose \S\ > r and consider all linear 
combinations (over Z m ) of elements of S with coefficients in {0, 1} C Z m . Since | colspan (M)\ < 2 r 
there are two distinct — 1 linear combinations that map to the same element. This means that 
there is a linear combination with coefficients in {1, —1} of the elements of S that is equal to zero. 
Since both 1 and —1 are invertible modulo m we can write one of the elements of S as a linear 
combination of the other elements. This contradicts the minimality of S and so, we must have 
\S\ < r. This implies that rank(M) < r, a contradiction, since we can write M as the product of 
the matrix with columns in S with the matrix of coefficients giving the columns of M. □ 

Claim 4.7. Let M be an t x t matrix over Z m , let r = rank(M). There exists r' columns of M 
that span the rest of M's columns such that r' < rlogm. 

Proof Take a minimal spanning set S of the columns of M (that is, a set that spans all other 
columns and such that no proper subset of it spans all columns). If 2^ s \ > m r , then 2l s > 
|colspan(M)| (by Claim H~6|) and we proceed as in the proof from Claim H~6l above. If we look at all 
the — 1 combinations of the columns of S, then there are two distinct — 1 linear combinations 
of the columns that map to the same element of colspan (M). Thus, let ^ aiS (: i) = ^ faS (: i) 
where a% ^ fii for at least one i, say io- Therefore, we have Yli ( a i ~ A) S (: i) = 0. Note that 
(Qi — /3j ) = ±1 and hence is invertible. This lets us write S (: io) as a linear combinations of the 
remaining columns contradicting the minimality of S. Thus, r' = \S\ < rlogm. □ 

The following claim shows that the column rank behaves similar to rank in terms of subaddi- 
tivity. 

Claim 4.8. Let A,B betxt matrices over ^rn- Then, colrank (^4. -)- -£?) ^ colrank (-A)-l-colrank (-£>). 
Proof. We show that | colspan (A + B)\ < jcolspan (A) \ | colspan (B)\. Note that colspan (A + B) C 

def 

colspan (j4)+colspan (B) = {a+b\a £ colspan (A) , b € colspan (B)}. Therefore, jcolspan (A + B)\ < 
jcolspan (A) + colspan (B)\ < jcolspan (^4)| jcolspan (B)\. □ 

Claim 4.9. Let M be a 2t x It matrix over Z m? such that 



where A, B and * are t x t matrices. Then, colrank (A) + colrank (B) < colrank (M). 

Proof. We show that jcolspan (A)\ jcolspan (B)\ < jcolspan (M)|. Let colspan (A) = Ri, colspan (B) = 
i?2) colspan (M) = R. We define / : R\ x R2 — > R and show that / is injective. Given n € R\ and 
r 2 £ -^2) let ai, • • • at and f3i, ■ ■ ■ fit denote coefficients for linear combinations of the columns of 
A and B respectively that give r\ and t-l- There might be many such linear combinations but we 
fix one for each ri. Then, / (ri,r2) = Yll=i ®iM (: i) + YlT=t+i Pi-tM (: i). Now, given a column 
vector / (ri, E R, we uniquely identify r\ and ri as follows. We look at the first t rows and call 
it s\. Now s\ = r\ and let a±, ■ ■ ■ at be the linear combination fixed for n while defining /. Now, 
consider / (n, r2) — Y2i=i Oi%M (: i) and call the last t rows S2- Note that S2 = r^- □ 
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Claim 4.10. Let M be at x t square matrix over Z m with zero diagonal entries. If for some s\m, 
colrank (M^) < 2, then there exists at least t' = t/m 2 indices such that M restricted to those 
indices as rows and columns is the all zero matrix modulo s. 

Proof. As colrank (M^) < 2, it follows that [colspan (M^) | < s 2 < m 2 . Hence, MW has 
at most m 2 distinct columns. Therefore, there exists a set of indices S of size t' > t/m 2 with 
S = {ri, r2, ■ ■ ■ r t /} such that all the columns (: n) are identical. Also, as the diagonal elements 
are zero modulo m, they are zero modulo s. Thus, the Sx S submatrix is the all zero matrix modulo 
s. □ 



5 Collision-Free MV families 

In the proof of Theorem [2] it will be useful to assume that the elements of the MV family do not 
'collide' when reduced modulo an integer s dividing m. In this section we develop the necessary 
machinery to allow for this assumption. We start by defining a collision free matching vector family. 

Definition 5.1 (Collision free MV family). A collision free matching vector family (U, V) in ZJ^ 
is a matching vector family such that for all s\m,s > 2, all elements of U are distinct modulo s, 
and all elements ofV are distinct modulo s. Note that if (U,V) is a collision free matching vector 
family, then so is any (U',V) C (U,V). 

Lemma 5.2. Let m > 2 be an arbitrary integer. Let s be a divisor of m, such that 1 < s < m. Let 
(U,V) be a matching vector family in ZJ^ such that (u,v) = (mod s) for all u € U,v 6 V. Then, 
\(U,V)\ < MV (m/s,n log m). 

Proof. Let U = (ux,U2, ■ ■ ■ ut) and V = (v\, 1)2, ■ ■ ■ vt). Recall that Pjj,v 1S the inner product matrix. 
We shall write Pjjv as P m the rest of the proof for brevity. Let r = rank (P) < n. Hence, by 
Claim HT71 there exists r' < r ■ logm columns of P which span all the columns of P. As each entry 
of P is a multiple of s we can define a matrix P 1 over Z m y s by P' = (1/s) P. We have 

• /'/, " Vi. 

We next show that the r' columns that span the columns of P also span the columns in P' . 
Without loss of generality, let the first r' columns of P span the remaining columns of P. For any 
column j, let P (: j) = Y2l=i °iP {'■ i) ( m od m). Since all entries of P are divisible by s, we can 
divide the expression by s and obtain that P' (: j) = Ya=i c iP' ('■ ( m °d m/s). Hence, we deduce 
that rpi = rank(P') < r' < rlogm < nlogm. This implies that P' = AB for some t x rp> matrix 
A and some rpi xt matrix B over Z m / S . Thus, the rows of A and the columns of B form a matching 
vector family in Z w f^ g . Therefore, t < MV (rn/s, n logm) as claimed. □ 

Lemma 5.3 (Bucket Lemma). For any m, let (U,V) be a matching vector family in ZJ^. Let 
1 < s < m be any divisor of m. Then, for any w £ Z", \B S (w, U)\ < MV (m/s, n logm). By 
symmetry, \B s (w, V)\ < MV (m/s, n logm). 
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Proof. We prove that \B S (w, U)\ < MV(m/s,n). For U = (u±, U2, ■ ■ ■ Ut), consider any bucket 
B s (w,U) = U' (say). Let U' = (uj 1 ,Uj 2 , ■ ■ ■ Uj t ,) where 1 < j\ < j'2 < ■ ■ ■ jt> < t- Let V' = 
\ v ji i v h ' ' ' ' v j t ' ) ■ Now, for any l,m £ [t'] , (uj l , Vj t ) = (mod m) . Therefore, (uj m , Vj t ) = (mod s) . 
By Lemma on (U',V), t' < MV (m/s,n log m). □ 

We use the above lemma repeatedly to obtain a collision free matching vector family. 

Lemma 5.4. Let m > 2 be any positive integer. Suppose there is a matching vector family (U, V) 
in Z^j. Then, there exists a collision free matching vector family (U',V') C (U,V) such that 

i (I r,v)|>- ^ 

n s |m,l< s <m MV(s,nlogm; 

Proof. We will get rid of collisions iteratively by repeatedly applying Lemma 15.31 Let us write 
the divisors of m in ascending order as 2 < si < S2 < ■ ■ ■ < si < to/2. Perform the following 
operation for each s\m starting from the smallest divisor greater than 1. For < i < I, let Ui,Vi 
be the matching vector after stage i with Uq = U and Vq = V . Now suppose that we have Ui,Vi 
after the z'th stage such that there is no collision modulo s,- in \J% for 1 < j < i. The (i + l)'th 
stage is performed as follows. Let us construct £/j+i, Vi + \ from C7i, Vi to ensure no collision among 
the elements of Ui + \ modulo as well. For each w € , by Lemma [5.31 |i? Si+1 (w,Ui)\ < 
MV(m/sj+i,nlogm). Pick one element from each bucket in Ui and the corresponding matching 
vector from Vi to form (C/i+i, Vi+i). Thus, |(C/i+i, Vj+i)| > /MV (m/sj + i,relogm). We end 
up with matching vector family U u Vi such that \(U h Vi)\ > ^ 1<s< J Mv(L/s,niogm) and ^ is 
collision free. We repeat the same process this time pruning Vi in order to make it collision free 
as well. Thus, eventually we end up with a collision free matching vector family (U[, V() C (U, V) 
such that 



K v 1 ^ 



1(^,^)1 W,v)\ 



n s \ m ,i< s < m mv ( ?ti / s ' n io § m )) (n,| m ,i< s <m mv ( s > ^ io s m )) 

□ 



6 Proof of Theorem [2] 

Before proceeding with the proof we give yet another definition. 

Definition 6.1. Let A,B C 6e twin-free lists (or sets). Let uj be a primitive root of unity of 
order m. The duality measure of A, B with respect to oj is defined as 



D w (A,B) 



A,b~B 



Notice that, if uj ^ 1, D U1 (A,B) = 1 implies that there is some c € Z m such that all the entries 
of the inner product matrix Pa,b equal c. We often refer to such submatrices as monochromatic 
rectangles. 

The following is an easy consequence of Lemma 12.131 
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Lemma 6.2. Let (U,V) be a MV family in TU^ of size t > 3m and let co = exp (2m/m) be a 
primitive root of unity of order m. Then there exists some 1 < j < m — 1 such that 

D uj (U,V)> 



3m 3 / 2 



Proof. Let \x be the random variable which chooses u € U and v £ V randomly and outputs (u, v) 
and let U m be the uniform distribution over Z m . Now, A(fj,,U m ) > (1/2) (Pr[U m = 0] — Pr[/i = 0]) ■ 
(1/2) (1/m — 1/t) > l/3m as t > 3m. By Lemma 12.131 for some 1 < j < m — 1, 



> 



Thus, we have 



■U,v~V 



(u,v) 



> 



3 m 3/2 



3m 3 / 2 
as claimed. 



□ 



An important ingredient in the proof of Theorem [5] is the following lemma, referred to in the 
introduction as the 'sub-matrix lemma' which is a generalization of a result of [BSLZll] . 

Lemma 6.3 (Sub-Matrix Lemma). Let s,m,n > 2 where s divides m, and let co be a primitive 
root of unity of order s. Let A,B C be two twin-free lists satisfying (A,B) > 3r ^ 3 / 2 ■ Let 
rank (Pa. b) = r >2. Then assuming Conjecture^ (PFR conjecture), there exist lists A' C A,B' C 
B such thatDu {A',B') = 1, where \A'\ > 2-< m >/ lo ^ r \A\, \B'\ > 2- c ( m W lo § r \B\ for some constant 
c(m) which depends only on m. 

Without loss of generality, we can assume c(m) > 1 above (it will be convenient to assume it 
in the proof of Theorem [2]). In other words, we can replace the c(m) above by max{c(m), 1}. We 
postpone the proof of Lemma 16.31 to Section \7\ and proceed now with the proof of Theorem [2l 

We restate Theorem [2] here for convenience and with the explicit function d(m). 

Theorem 6.4. Let n, m > 2 be arbitrary positive integers. Then, assuming Conjecture^ (PFR 
conjecture), we have 

MV(m,n) < 2 d{ - m)n ' lo ^ n , 
where d(m) = 1200c (m) m 61ogm and cirri) is as in Lemma \6.3[ 

Proof. We prove the theorem by induction on the number of (not necessarily distinct) prime factors 
of m. 



Choice of d(m). Let d,d±,d2,d^ ■ Z + — > R be functions and d^ be a constant. We want the 
following conditions to be satisfied for all m,n > 2. 

1. d (m) , d\ (m) , (m) , d% (m) are monotonically increasing in m 

2. (2n) m < 2 d (' m ) n / lo s n 

3. (2m) m < 2 d ( m ) n / lo § n 

4. d (m) > d (m/2) ■ 4m log m 

5. -d 2 (m) + (1/2) d(m) > d (m/2) log m 
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g. 2( 1 / 2 ) d ( m ) n / lo s n > 3 m 2 d2 ( m ) n / logn 

7. d,2 (to) nj log n > 2 log m + c?3 (m) n / log n 

8. c?3 (to) > c?i (m) • d A ■ mlogm 

9. d A > 300 

10. di (m) > 2c (m) 

11. d 2 > d 3 + 1 

It can be verified that the following choice for the functions meets the above conditions. 

• d{m) = 1200 • c(m) • m 61ogm 

• di (m) = 2 • c (m) 

• c?2 (to) = 602 • c (to) • to log to 

• c?3 (to) = 600 • c (m) • to log m 

• dk = 300 

We shall explicitly mention which conditions of the above functions are being used in different 
parts of the proof. 

Base Case. The base case is where to = p is prime. Lemma 12.91 implies that MV (p, n) < 
1 + { n+ p p r i 2 ) < (2max{n,p}) p . If we show {2nf < 2 d (pW lo s™ and (2p) p < 2 d ^ n l lo ^ n we will be 
done. Indeed, by the choice of d (to) (Condition 2 and 3) both of the above will hold. 

Inductive Case. Let n > 2, m > 2 be arbitrary positive integers. Suppose, by induction, that 
MV (s, n) < 2 rf ( s ) n / logn for all s\m, s < m. We need to show that, assuming Conjecture [TJ 

MV (to, n) < 2 d ^ n l lo & n 

Suppose not. That is, there exists a matching vector family (U, V) of size t > 2 d ^ m ^ n / log n . First, 
we shall apply Lemma 15.41 to (U, V) to obtain a large enough collision free matching vector family 
(U',V). ' 

A large collision free matching vector family. We show that \(U', V')\ > 2( 1 / 2 ) a! ( m ) n / lo s n . 
Let \(U' , V')\ = t'. Observe that by Lemma 15.41 the inductive hypothesis and the monotonicity of 
d (to) (Condition 1), t' > 2 d ( m )™/ lo g™- 2m - d ( m / 2 )-™ lo g m /iogn where we have used a i oose upper bound 

of to for the number of factors of to. Now, 

^ 2(l/2)d(m)n/ logn 

if d(m) n/logn — 2m ■ d(m/2) ■ nlogm/logn > (1/2) d(m) n/logn 
<^ d (to) > d (to/2) • 4to log to 

which is satisfied by the choice of d{m) (Condition 4). 
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Two key claims. We will need two claims from which the inductive claim follows easily. We 
shall provide proofs to these claims after the proof of the inductive claim. 

Claim 6.5. Let (U,V) be a collision free matching vector family in ZJ^ with \(U,V)\ > 3m and 
colrank (p^ V j > 2 for all s'\m,s' > 2. Then, for some s\m,s > 2, there exists a collision free 
matching vector family (U',V') C (U, V) in ZJ^ satisfying 

• \{U',V')\ > 2-*H r '/ ll) s r « \(U,V)\ where r s = rank (Puyj. 

• Either colrank (P^y) < ( 3 / 4 ) colrank (Puy) or colrank (p^v) - 2 - 

Claim 6.6. Let (U,V) be a collision free matching vector family in ZJ^ such that \(U,V)\ > 3m ■ 
2^2 (m)n/ log n ^ Then, there exists a collision free matching vector family (U',V) C (U,V) in ZJ^ 
satisfying 

• \{U',V')\ > 2- d 2(m)n/logn|( [/) ^|_ 

(s) 

• Pjjy is the all zero matrix for some s\m, s > 2. 

Let us proceed with the proof of the inductive claim assuming these two claims. We have a 
collision free matching vector family (U',V) with \(U',V')\ > 2( 1 /2)rf(m)n/iogn > 3m . 2 «faMn/iog» 
(Condition 6 satisfied by the choice of d (m) , cfo (m)) Applying Claim 16.61 there exists a collision 
free matching vector family (U",V") C (U',V') C (?7, V) in ZJ^ satisfying 

• > 2 _d2 (' m ) ra / 1 ° sn 2^ 1 / 2 ^( m ) n//logn . 

(s) 

• y„ is the all zero matrix for some s\m,s > 2. 

By the choice of d (m), it can be verified that — d<i (m) + (1/2) d (m) > d (m/2) log m (Condition 

5). Thus, \(U",V")\ > 2 d ( m /2)nlogm/logn > 

We now show that this is enough to get a contradiction. If s = m, we have \(U",V")\ < 1 
as (£/", V") is a matching vector family in Z^. If s < m, by Lemma 15.21 and the inductive 
hypothesis, we have 1(17", < 2 d ( m / s ) nlo s m / lo s( nlo s m ) < 2 d ( m / 2 )" lo s m / lo g" by monotonicity of 
d(m) (Condition 1). Thus, irrespective of s, \(U",V")\ < 2 d ( m / 2 ) nlo e m / lo s n which is a contradiction. 
This completes the proof. □ 



Proof of Claim [6.5i Let \(U, V)\ = t > 3m. Let w be a root of unity of order m. By Lemma 
for some 1 < j < m — 1 , D wJ (U,V) > 3m 2 3 / 2 . Note that s = m/gcd(m,j) is the order of 
uj 3 . Observe that s\m,s > 2 as 1 < j < m — 1. Recall from the statement of the claim that 
,(«) 



rcm/c I Pjj y ) • Thus, by the collision free property of (U, V), 



E 



(a/) 



(ii,u) 



E 



(a/) 



Ay (CW > 



3 m 3/2 
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Applying Lemma f6.3l on U^ s \ with uj' a primitive root of unity of order s, we can get an (R x 5) 
submatrix of Pjj,v with \R\ = \S\ > 2~ c ( m ) rs / logrs i. (we can make \R\ = \S\ as throwing away rows 
and columns from a monochromatic rectangle still keeps it monochromatic) Let T = R(~) S. We 
divide our analysis to two cases: either \T\ > \R\/2 or |T| < \R\/2. In both cases, we shall exhibit 
a matching vector family as required in the statement of the claim. 

Case 1: \T\ > \R\/2. For U = (m, u 2 , ■ ■ ■ u t ), V = (^i, v 2 , ■ ■ ■ v t ), let U' = (uj\j G T) and 
V = (vj\j G T), and P' = Pjjiyi. Now, as P'( s ) is monochromatic, and (uj,Vj) = (mod s) for 
j G T, we have {u' , v 1 ) = (mod s) for all u' G V , v' G V. Observe that 

• \(U',V')\ > 2- l -< m >s/ l °Z r n > 2~ 2c (m)»-./logr. i > 2-*("»)r./logr. t ( by tne choice f di (m), 
Condition 10) 

• colrank (Pu?v>) = < 2 
This finishes Case 1. 

Case 2: \T\ < \R\/2. Let R' = R\T and S' = S\T. Note that i?' n 5' = and \R'\ = \S'\. 
Consider the R' U S' x R' U S' submatrix of P uy . Call it P' . Note that 



pl(s) _ ( p [ C \ 

~ \* n ) 



where P[ and P^ are the R' x R' and the S' x 5' submatrices of Pjjy respectively and C is 

monochromatic. We add a matrix of column rank at most 1 to P'^ to yield P"M which is the 
same as P'W except that C is replaced by the all zero block matrix. Thus, 



pll(s) 



P[ 



2 



Note that by ClaimHSl colrank (P"( s )) < colrank (P'M) + 1. Now, using ClaimSjJJ colrank (P[) + 
colrank (P^) < colrank (P'^) + l < colrank (pfy^+l < (3/2) colrank (p^ v ) as colrank (pfy) > 



2. Therefore, one of P[,P2, say P[ satisfies colrank (P[) < (3/4) colrank ( Pjjy ) • Construct the 



U,V 

matching vector family (U',V') as follows. Let U' = (uj\j G R') and V = (vj\j G R'). Again 
observe that 

• \{U',V')\ > 2- 1 -< m >°/ l °Srst > 2- 2c Mr s /logr^ > 2 -d 1 (m)r s /logr st ( by the choice of ^ ^ 

Condition 10). 

• colrank (p$ v ,) < (3/4)colrank (p, (s) 



This completes the proof of Case 2. 



□ 



Proof of Claim l6.6t We will use Claim 1531 iteratively. For this, we first set up some notations. 
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The setup. Define a sequence of collision free matching vector families for i = 0, . . . , z. 

. (u,v) = (c/ ,vb),(c/i,yi)--- 

• Let ti = \(U h Vi)\. 

• Each step i has label Si\m (this label will be given by Claim l63j) . 

• Let cri : Z + — > M. be defined by 



cri (s) = colrank 




• Let rj : Z + — > 7L be defined by 

n (s) = rank (p^J v .) . 

Invariants. We will show how to go from step i to step i + 1. We stop after stage z when 
cr z (s) < 2 for some s|m, s > 2. We shall maintain the following invariants for < i < z — 1. 

• (C/j+i, Vi+i) C (C/j, ^) and hence is a collision free matching vector family in r E^ n . 

• cr i+1 (sj) < (3/4) cri (s;) or cr i+l (sj) < 2. 

• cri + i (s') < cr, (s') for all s'\m. 

Step i — > Step i + 1. We state a claim that we will prove below. 
Claim 6.7. £5^ d x (m) rj (sj) / log ri (sj) < d 3 (m) n/logn. 

In order to apply Claim [631 we need to satisfy ti > 3m. Observe that by Claim [BTTI 

^>* 2 > £ o n 2~ di(r " )r J ( ^ )/iog ^ ( ^ ) 

3=0 

y 2-di{m)n/logn^ o 

> 3m ■ 2~ d 3( m ) n / lo g n+d2 ( m ) n / lo s n > 3 m 

(by the choice of d,2 (m) ,c?3 (m) in Condition 11). Apply Claim [631 to (Ui, Vi) to get label Si for 
step i and (t/j+i, T^+i) C (t/j, V^). The first three invariants are maintained by the statement of 
Claim [631 The last invariant follows from Fact 14.41 Note that by the inequality we just established, 
t z > 2- d ^ m ^ losn t . Also, by the stopping condition, cr z (s') < 2 for some s'\m,s' > 2. Thus, 
applying Claim [4TT0| we get another matching vector family (U', V) C (U z , V z ) C ([/, V) such that 

• \(U',V')\ > t z /m 2 > 2~ 21o ^ m - d: ^ m ^ lo ^ n \(U,V)\ >2- d ^ n ' lo ^ n \{U : V)\ (Condition 7 satis- 
fied by the choice of d,2 (m) and d% (m)). 

(s') 

• Pjji v i s the au zero matrix. 
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This finishes the Proof of Claim 16.6 



Proof of Claim \677\ ' Let t s be the number of steps with label s. Note that as the column rank 
modulo s goes down by a factor of at least 3/4 each time we are in a step labeled s, it is easy to 
see that t s < log 4 / 3 cro (s) < log 4 / 3 n. We shall rely on the monotonic increasing nature of x/log x 
when x > e. As cr,; (s) > 2, by Claim B~6l rj (s) > crj (s) > 2 which means rj(s) > 3 > e as the 
rank is always an integer. We thus have 

< d\ (m)logmN — - — - (by Claim |4T6|) and monotonicity of x/logx as discussed above 

^ log (si) 



cr (s) 



U°g4/3 n 0)J 

< d\ (m) log m / > 

S |^> 2 \ (4/3)'" 1 log (cr (,) / (4/3)>" ] 

< d\ (m)logm d^cro (s) /logcro (s) (by Claim IA7T1 and Condition 9 satisfied by di) 

s\m,s>2 

< d\ (m) log to d^n/logn (as cro (s) < ro (s) < ro (m) < n, by Claim I4TB1 and FactH? 

s|m,s>2 

< d&d\ (m)m (log m) n/ log n 

< c?3 (m) nj log n (by the choice of <i 3 (m), Condition 8) 

This completes the proof. □ 



7 Monochromatic rectangles from low rank matrices 

In this section we prove Lemma 16.31 (the Sub-Matrix Lemma). We begin with some preliminary 
definitions. The following is a standard result in algebra and can be find in any introductory text. 

Theorem 7.1 (Fundamental Theorem of finitely generated abelian groups). Every finitely gener- 
ated abelian group G is isomorphic to a direct product of cyclic groups of prime power order and 
an infinite cyclic group. More precisely, 

where qi 's are prime powers with q\ < q% ■ ■ ■ < q r . The decomposition is unique after applying this 
ordering on qi 's. If the group G is finite, then n = 0. 

We will use the following two definitions regarding sumsets. 

Definition 7.2 (Difference Set). For A C define its difference set as A — A={a — a'\a, a' £ ^4}. 

Definition 7.3 (reps (x)). For any S C ZJ^ and x G Z^, reps (x) is the number of different 
representations of x as an expression of the form s — s' where s, s' £ S. 

Next, we define the e-spectrum of B with respect to a primitive root of unity of order to. 
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Definition 7.4 (Spectrum). For B C 2^, and e G [0, 1], the e-spectrum of B with respect to uj, a 
primitive root of unity of order m, is the set 

Spec, (B) = |rr G Z™ : E^ B <^ {x ' b) > e} • 
When uj is implicit in the context, we will drop the phrase "with respect to uj". 

We start by proving the following lemma which is a generalization of a lemma from [BSLZll] . 

Lemma 7.5. Let A,B<Z Z^ be sets. Let ui be a primitive root of unity of order m. If A C 
Spec e (-B), then there exist sets A' C A, B' C B, such that \A'\ > \A\/m and \B'\ > e 2 i—^rAji \ B\ 
such that (A',B') = 1. 

Proof. We start by setting up some notations. Let W = span (A) be the subgroup of Z^ spanned 
by A. By Theorem 17. 11 there exists an isomorphism r : YYi=i ^qi ~~ W. Let C = Y\l=i ^qi an d note 
that we can think of elements of C as vectors with integer coordinates where the i'th coordinate is 
in Z 9i . Let e±, e2, • • • e r G C where ej is the vector that has 1 in the i'th coordinate and everywhere 
else. Given x G C, 3ai, • • • CK r7 with Qj G Z 9i such that 

r 

x = y^aiej. 
i=i 

Then 

r ~~ X/I=i a « T ( e «)- Let Vi — t (ej) for 1 < i < r. We can think of the Uj's as a basis of W. 
Therefore, for a = (a\, at2, • • • ct r ) G C we have r (a) = X^I=i Let 

G = {(ft,---ft.) g Z£j3u G Z™ such that Vi,ft = (t>i,«)}. 

Claim 7.6. For 1 < i < r, gjVj = n (mod m). 

Proof. Let x = r G C. Now t (x) = n (mod m). Note that x can also be written as x = q%ei. 
Applying r on both sides, we get r (x) = qtVi. Thus, q^i = n (mod m). □ 

Claim 7.7. For (3 G G, 1 < i < r, qifc = (mod m). 

Proof. As j3 G G, there is a u G ZJ^ such that Vi, ft = (vi,u). Then, gjft = qi(vi, u) = (moc? m) 
by Claim ES □ 

For a G C,/3 G G we define their inner product (a, f3) G Z m by considering oti G {0, ... ,qi — 
1}, ft G {0, . . . , m — 1}, taking the inner product over the integers and then reducing the result 
modulo to. This is indeed an inner product by Claim [7771 



Claim 7.8. Given /3 G Q \ {0}, 

Proo/. Let ft / 0. Then £ aeC c>'^ = whenever = 0- Now, £fr o V& = Ijfer- 

This is well defined because uj is of order to and ft 7^ 0. The claim now follows from Claim 17.71 
which makes the expression zero. □ 
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With the above setup in place, we can now proceed with the proof of Lemma [731 For /3 G 0, 
define 

Sp = {x G K n \(vi,x) =Pi,l<i< r}. 

Denoting \x (/?) = Pr^sffe G S 1 /?], we observe that U^ge (1? n Sp) = B. Hence, J2peB A* 0^) = T 
For a G W, define h (a) = E beB [c>' 6 >] . If a = J2Ui a i v i then 

h(a) = E beB [j a ^ 



We will prove upper and lower bounds for the sum X^aeA l^ 1 ( a )| 2 - On ^ ne one nan d, 

|/i(a)| 2 > 7-jr ( IM a )l J (Cauchy Scwartz inequality) 
a&A \aeA J 

- JX\ (E e ) ( AQ S P ec ^ ( 5 ) im P lies ^ ^ 6 ) 
> \A\e 2 . 

On the other hand, 

ElMa)| 2 < ElMa)| 2 



= E E mnW^ 1 ™-" 

aew /3e&,/3'ee' 

= e /i(^(^E w(rlHH,) 

/8,/3'ee, aGiy 

= E/ 2 ^) 2 !^! (Claim EHD 

,9ee 

< |W|max{u(/9)}. 

/3G6 



Now, combining the upper and lower bounds, max^ g e{/i (/?)} > e2 jpj/j- Thus, there exists a 
(3 G such that /U (/?) > e 2 r^r- This means that the subset B' = B H Sp is of size at least 



e 2 y^||i?|. Now, any a G A can be written as a = YH=i a i v ii anc ^ f° r & ^ ^ ) the inner product 
(a, 6) = {a, (3) is independent of 6. Now, for i G [m], let ^ C ^4 be such that for a £ ij, for all 
b G B', (a,b) = {a, (3) = i. Now there exists some Ai, call it A', of size at least |^4|/m such that 
(a', b') = i for all a' G A', b' G that is, D u (A', B') = 1 and this proves the lemma. □ 
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We continue along the lines of [BSLZll] and prove the following lemma. 

Lemma 7.9. Suppose the twin free lists U, V C satisfy D u (U, V) > e where u is a primitive 
root of unity of order m. Also, let rank (ifyy) = r - Then assuming Conjecture^ for every 
K > 1, letting I = r/\og m K, there exist lists U' C U, V' C V such that D W (JJ',V') = 1, and 

\U'\>poly m (&$-) (2mr)- £ \U\, \V'\ > poly m ( ) m,- £ \V\. 



Proof. Let U = (u\, ■ ■ ■ ut) and V = (v±, ■ • ■ Vf). Since Pjjy nas rank r there exists aixr matrix 
Um and r x t matrix Vm so that UmVm = Pu,V- Thus if we let A denote the rows of Um and B 
denote the columns of Vm, then A, B C r E m . The proof does not care about the order of elements 
and hence we now consider A, B which are sets. Note that \A\ = \B\ = t and if A = (ai, •••(h) and 
B = (6i, • • • b t ) then (oj, &j) = (ui,Vj) for 1 < i, j < i. Thus, ([/, V) > e implies D w (A, B) > e. 
Following [BSLZll] consider a sequence of constants ei = e/2, e 2 = e i/2, £3 = e|/2, ••• and a 
sequence of sets Ai = A n Spec ei (B) and Aj C (A4-1 — Aj_i) D Spec e . (.B). The way the subsets 
are chosen for A^s will be made precise shortly. Now by the pigeonhole principle, there exists a 
minimal index £ < r/\og m K such that |A^ + i| < if|A^|. To give a precise definition of the A^'s , 
we have the following. Let A\ = A n Spec e/ / 2 (B). For i > 2, assuming e^_i and Ai-\, let jj be the 
the integer index which maximizes the size of 



{(a, a') G -Aj_i x Aj_i|a — a' G Spec e . (I?) and m Jl < repA i _ 1 (a — a') < m^ i+1 }, 



and let 



Aj = {a — a'\a, a' G A,_i, o — o' G Spec £j (f?) and m Jl < rep J 4 i _ 1 (a — a') < m Ji+1 }. 

Claim 7.10. Fori = 1 we /iaue |Ai| > (e/2) |A|. Fori > 1 we have Pr a ^ a i £ A i _ 1 [a — a' G A,,] > e^/r 
and additionally \Ai\ > — ^rr- |^-i-i| ■ 

Proof. The case of i = 1 follows from Markov inequality. For larger i, we show that 

Pr a ,a'eA_i[a - a' G Spec £i (£)] > e*. 

This follows from the fact that 



<-i< 



E, 



6eB,aGAi_i 



Co' 



(a,6> 



< E 



beB 



a-' 



<a,b> 



E a ,a'GAi_ilEb e B 



(a-a',6) 



Now applying Markov inequality we get that Pr / ev i i _ 1 [a — a' G Spec e . (£>)] > e^ = ef_ 1 /2. Now 
selecting jj as in the construction gives that Pr oa 'eAi_i [a — a' G AJ > e^/r. 

To prove the second part of the lemma, observe that by the above, we have shown that 



|{(a, a') G Ai-i x Ai_i|a — a' G Aj}| > — |Aj_i| 2 . 



Also, by construction of Aj, since every x G Aj can be represented as x = a — a' with a, a' G 



in at most m- Ji+1 ways, we have that \A%\ > 



\A 



i-l 



This completes the proof. 



□ 



Below we will use the following additive-combinatorics lemma. 
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Theorem 7.11 ([B S94} [Gow9 8j). There exists an absolute constant c > such that the following 
holds. Let A be any arbitrary subset of an abelian group G. Let S C G be such that \S\ < C|j4|. If 
Pr aa / gj 4[a — a' G S] > 1/C, i/ien £/iere exists a subset A' C ^4 suc/i i/iaf > j^i and |A' — A'| < 
C c |Jl|. 

Now we come to the main claim. 

Claim 7.12. For i = £, t - 1, • • • 1 there exist subsets A\ C JB| C £ suc/i that D u {A'^B'A = 1 
and 

and 

w/iere a* = poZy m (^-) (2mr)" (f_!) (llj=i e i+i) > A = P<%m {tr-) m~^~ 1 ^ 

Base Case. The base case of i = i is proved by an application of the Balog-Szemeredi-Gowers 

theorem followed by Conjectured] followed by Lemma [7.51 To see this, we know that |-A^+i| < 

and Pr 0)tl / e ^[a — a! G -A^-i] > e^+i/r. Hence by Theorem 17. Ill (with C = jf^), there exists a set 

^' C A £ such that > poly (^-) \A e \ and \A'[ - A'[\ < poly (j^) 141- Now by Conjecture!] 
applied to there exists a set A"' C ^ such that \A^\ > poly m {^f) \A'j, \ and |span(^")| < 
m\A"\ = poly m I4"l- (Note the extra factor of m in front of \A"\ as we get a coset of size 

\A'f\ and its span incurs an additional factor of m) Also, as A'" G Spec € (B), applying Lemma [731 
to A'l' and 5, we get A' e C and 5{CB such that B£) = 1, |^| > poly m {^f) \ A(\ and 

\Bg\ > poly m \B\. This completes the base case. Let us come to the inductive case. □ 

Inductive Case. Suppose the statement is true for i and let us argue for i — 1. Let G = £7) 
be the graph whose vertices are the elements in and (a, a') is an edge if a — a' G A[. Now, 

> m J 'Qj|^4j| (inductive hypothesis) 

> m j ' ai %-„ \A^ X \ 2 (Claim [HO]) 
= 2a i _i|A i „i| 2 

Now the graph has at least 2aj_i| J 4j_i| 2 edges and vertices and therefore has a connected 

component of size at least 2aj_i| Ai—\\ vertices. Let us call these vertices A'-_ 1 . Let a be any element 
of A'!-, . Partition Bt into B'- ■ for < i < m — 1 such that all elements of B'- • have inner product 
j with a. Let B' i _ 1 = Bij l be the largest of them. Note that > {B^/m. By assumption 

D w (A[,B'A = 1. Hence, D u [A' i ,B' i _ 1 ) = 1. Therefore, for some j2, (a,b) = j2 for all a£ij and 
6 G Now, in the connected component obtained above, whenever a, a' G A'-_ 1 are neighbours, 

(a — a',b) = j2 for 6 G B' i _ 1 . Thus, starting with a as the anchor and propagating throughout the 
connected component, we can classify the vertices in A"_^ based on the inner product it has with 
all elements in which is either j\ or j'2 — j\. Pick the larger set and call it A' i _ 1 . Hence, 

(A'^BU = 1. Thus, > |4_!|/2 > Oi-ilA-il and > |B<|/m > ^\B\ = fc-i\B\. 

This completes the inductive case. □ 
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Put i = 1 in the above claim. Also observe that as f-j+i = e v /2 23 - 1 > (e/2) 2 \ Thus, 
e e+ i > (e/2) 2 " and ]J e j=1 e j+1 > (e/2) 2<+1 there exist A' C A x C A, B' C S, such that |A'| > 

po/y ^ J (2mr) - ^ |A| and |-B'| > po/y i ^^K J rn~ e \B\. Observing that the lower bounds 
grow weaker with increasing £,and that I < £' = r j log m K we get \A'\ > poly f — 1 (2mr)~^ |A| 

and > poly — j m~~ e ' \B\ where £' = r/\og m K. Therefore, if we take the list U' C U 

(corresponding to A' C A) and V' C V (corresponding to B' C £>) then as (ai,bj) = (ui,Vj) the 
statement of the lemma follows. This completes the proof of Lemma 17.91 □ 



We can now prove the Sub-Matrix Lemma, Lemma 167 

Proof of Lemma [6J2 Set K = s 4r / log V = ^p, e = l/2m 3 / 2 while applying Lemma ES over 
Z s . We get |A'| > <5 S |A|, > 5 S \B\ where 

(5 S = poly s ( ) 2~ Cl ( s - )r / logr (for some constant c\ (s) depending only on s) 

\m r J 

> poly m (-^]2-^ s >l^ 
\m' / 

Now let c 2 (m) = max s , ms>2 {ci (s)}. Thus, <5 S > po/y m f^r) 2- C2 ( m ) r / lo s r > 2~ c ( m ) r / lo e r for 
some constant c that depends only on m. □ 
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A A Calculation 

Claim A.l. Let b> l,n > 2 be arbitrary integers. Then 

g log (n/b^) ^ m/l ° gn 
where f(b) = ™ + ^ + JJl When b = 4/3, f{b) < 300. 

Proof. We divide the summation into two parts. The first part consists of the first [l°gft log n J 
terms and the second part consists of the remaining terms. 

In the first part, ,.■ 1 . 1 1 < ,, , n \, whenever n > 2 and hence the first part summation 

1 ' o l_1 log n/b % 1 — b % i 0.11ogn — 1 

is bounded from above by ( fe _|)°f ogn - 

In the second part of the summation, we use the monotonicity of x\og{n/x). The function 
increases with x as long as x < n/e. Therefore, for terms with b l ~ 1 < n/e, the maximum value 
of each summand is given by substituting i = log b log n which gives an upper bound of - - ^ a - . 

The remaining terms corresponding to n/b > > n/e (note that these extra terms arise only if 
b < e) can be analysed as follows. Observe that each summand in that range can be upperbounded 
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by , e , . Therefore, we have at most loff!,n terms each at most --^ 1 r~r- Thus, the second 

J n log 6 ' °» log^ n n log 6 ' 

part of the summation is bounded from above by log b n ( ■— — h 



log 2 n n logb J " 

10 e \ 10 1 e logn 

log 2 n n log b) log b log n log 2 b n 

10 1 e 16 . , 2 , 

< 1 7)— (as 16n > log n) 

log b log ?i log z b log n 

10 16e \ 1 

+ 



log b log 2 bj logn 

This completes the proof. □ 



B Proofs of Two Probability Lemmas 
B.l Proof of Lemma I2A31 

Let / : Z m — > C be any function. Recall that, for < j < m — 1, the Fourier coefficients of / are 
given by 

f(j) = — Y] f ( x ) ex P (-ZTrijx/m) . 

It is well known that the set of functions {exp (27rf/x/m)}o<j<m-i is an orthonormal basis for all 
functions of the above form, and that / can be expressed as 

m— 1 

f ( x ) =^2 f (•?) ex P (27rijx/m) . 
i=o 

Let us consider / : Z m — > [0, 1]. Thus, Parseval's identity states that 



in — i 



i 2 

EN =1 E/ 

L — ' I m ^-^ 

j=0 xGZ m 



x) 2 < 1. 
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Observe that as U m (x) = 1/m is the constant function, U m {j) = for j ^ 0. Also, for any 
distribution [i, p, (0) = 1/m. Now 

2e < \^{x)-U m {x)\ 



< ^fm / —U m {x)\ 2 (Cauchy Schwartz Inequality) 

V x£Z m 



in* 



m—l 



\ m | (a (*) ~ ^ m (* 

\ i=0 



m. 



m— 1 



]T | A (i)| 2 (Z4i (j) = for j + 0, and fi (0) = U m (0) = 1/m) 
\ i=i 



< m 3/2 max{|/i(i)|} 
Thus, for some j ^ 0, we have 



IA(i)l> 



2e 



m 1 



3/2 ' 



Observe that 



AO") 



— fi (x) exp (— 2wijx/m) 



—E x ^n [exp (-2mjx/m) 
m 

m i 



Let j' = m — j. Thus, \p, (j)| > — fm implies that 



E 



X^fJ, 



> 



2c 



■m 



This concludes the proof. 



□ 
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B.2 Proof of Lemma I2A41 



e 2 



CO 



{wi,W2) 



< [ E ^i(wi) 



E M™2) 



UJ 



(■W1,W2) 



< E ww 2 E 



E ^2(^2) 



{«Ji,l0 2 ) 



(cp(Mi)) E E ^2(^2)^2(^2) 



UJ (w 1 ,w 2 -w'^) 



(cp( m )) ^ ^H 2 m™ 

m n • cp (/xi ) cp (/x 2 ) 
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